                              README Notes
                    QLogic bnx2x VMware ESX Driver

                          QLogic Corporation

                 Copyright (c) 2015 QLogic Corporation
                           All rights reserved


Table of Contents
=================

  Introduction
  Driver Dependencies
  Driver Settings
  Driver Parameters
  Driver Defaults
  Unloading and Removing Driver
  Driver Messages
  Dual Media Support
  Memory Limitation
  MSI-X Vector Limitations
  MultiQueue/NetQueue
  VMDirectPath I/O pass-through Support
  SR-IOV Support
  VXLAN Support
  Notes

Introduction
============

This file describes the bnx2x VMware ESX driver for the QLogic QLE84xx/34xx/74xx  
57710/57711/57711E/57712/57712_MF/57800/57800_MF/57810/
57810_MF/57840/57840_MF 10Gb PCIE Ethernet Network Controllers.

Driver Dependencies
===================

Please install the entire driver bundle set from the same QLogic driver
bundle zipfile.  Mixing and matching of different drivers from different
driver bundles are not allowed and will cause problems.

Driver Settings
===============

The bnx2x driver settings can be queried and changed using ethtool. 

Some ethtool examples:

1. Show current speed, duplex, and link status:

   ethtool vmnic0

2. Change speed, duplex, autoneg:

Example: 100Mbps half duplex, no autonegotiation:

   ethtool -s vmnic0 speed 100 duplex half autoneg off

Example: Autonegotiation with full advertisement:

   ethtool -s vmnic0 autoneg on

Example: Autonegotiation with 100Mbps full duplex advertisement only:

   ethtool -s vmnic0 speed 100 duplex full autoneg on

3. Show flow control settings:

   ethtool -a vmnic0

4. Change flow control settings:

Example: Turn off flow control

   ethtool -A vmnic0 autoneg off rx off tx off

Example: Turn flow control autonegotiation on with tx and rx advertisement:

   ethtool -A vmnic0 autoneg on rx on tx on

   Note that this is only valid if speed is set to autonegotiation.

5. Show offload settings:

   ethtool -k vmnic0

6. Change offload settings:

Example: Turn off TSO (TCP segmentation offload)

   ethtool -K vmnic0 tso off

7. Get statistics:

   ethtool -S vmnic0

8. Perform self-test:

   ethtool -t vmnic0

   Note that the interface (vmnic0) must be up to do all tests.

9. See ethtool man page for more options.


Driver Parameters
=================

Several optional parameters can be supplied as a command line argument
to the insmod or modprobe command. These parameters can also be set in
modprobe.conf. See the man page for more information.

The optional parameter "int_mode" is used to force using an interrupt mode
other than MSI-X. By default, the driver will try to enable MSI-X if it is
supported by the kernel. In case MSI-X is not attainable, the driver will try
to enable MSI if it is supported by the kernel. In case MSI is not attainable,
the driver will use legacy INTx mode. In some old kernels, it's impossible to
use MSI if device has used MSI-X before and impossible to use MSI-X if device
has used MSI before, in these cases system reboot in between is required.

Set the "int_mode" parameter to 1 as shown below to force using the legacy
INTx mode on all QLogic QLE84xx/34xx/74xx NICs in the system.

   vmkload_mod bnx2x int_mode=1


Set the "int_mode" parameter to 2 as shown below to force using MSI mode
on all QLogic QLE84xx/34xx/74xx NICs in the system.

   vmkload_mod bnx2x int_mode=2


The optional parameter "disable_tpa" can be used to disable the
Transparent Packet Aggregation (TPA) feature. By default, the driver will
aggregate TCP packets, but if a user would like to disable this advanced
feature - it can be done.

Set the "disable_tpa" parameter to 1 as shown below to disable the TPA
feature on all QLogic QLE84xx/34xx/74xx NICs in the system.

   vmkload_mod bnx2x disable_tpa=1

Use ethtool (if available) to disable TPA (LRO) for a specific QLogic 
QLE84xx/34xx/74xx NIC.

The optional parameter "num_queues" may be used to set the number of queues and
interrupt mode is MSI-X. If interrupt mode is different than MSI-X
(see "int_mode" parameter), number of queues will be set to 1 discarding the
value of this parameter.

The optional parameter "dropless_fc" can be used to enable a complementary
flow control mechanism on 57711, 57711E, 57712 or 578xx. The default flow
control mechanism is to send pause frames when the on chip buffer (BRB) is
reaching a certain level of occupancy. This is a performance targeted flow
control mechanism. On 57711, 57711E, 57712 or 578xx one can enable another flow
control mechanism to send pause frames in case where one of the host buffers
(when in RSS mode) are exhausted. This is a "zero packet drop" targeted flow
control mechanism.

Set the "dropless_fc" parameter to 1 as shown below to enable the dropless
flow control mechanism feature on all 57711 or 57711E QLogic QLE84xx/34xx/74xx
NICs in the system. The parameters will also work on 57712 and 578xx devices
with DCBX feature disabled or in case of DCB protocol has negotiated pause flow
control with a link partner.

   vmkload_mod bnx2x dropless_fc=1

The optional parameter "autogreeen" can be used to force specific AutoGrEEEN
behavior. By default, the driver will use the nvram settings per port, but if
the module parameter is set, it can override the nvram settings to force
AutoGrEEEN to either active (1) or inactive (2). The default value of 0 to use
the nvram settings.

The optional parameter "native_eee" can be used to force specfic EEE behaviour.
By default, the driver will use the nvram settings per port, but if the module
parameter is set, it can force EEE to be enabled, and the value will be used
as the idle time required prior to entering Tx LPI. Default value of -1 causes
forceful disable of EEE. Value of 0 indicates usage of the nvram settings.

There are some more optional parameters that can be supplied as a command line
argument to the insmod or modprobe command. These optional parameters are
mainly to be used for debug and may be used only by an expert user.

The debug optional parameter "poll" can be used for timer based polling.
Set the "poll" parameter to the timer polling interval on all QLogic
QLE84xx/34xx/74xx NICs in the system.

The debug optional parameter "mrrs" can be used to override the MRRS
(Maximum Read Request Size) value of the HW. Set the "mrrs" parameter to
the desired value (0..3) for on all QLogic QLE84xx/34xx/74xx NICs in the system.

The debug optional parameter "debug" can be used to set the default
msglevel on all QLogic QLE84xx/34xx/74xx NICs in the system. Use "ethtool -s"
to set the msglevel for a specific QLogic QLE84xx/34xx/74xx NIC.


Driver Defaults
===============

Speed :                    Autonegotiation with all speeds advertised

Flow control :             Autonegotiation with rx and tx advertised

MTU :                      1500 (range 46 - 9000)

Rx Ring size :             4078 (range 0 - 4078)

Tx Ring size :             4078 (range (MAX_SKB_FRAGS+4) - 4078)

                            MAX_SKB_FRAGS varies on different kernels and
                            different architectures. On a 2.6 kernel for
                            x86, MAX_SKB_FRAGS is 18.

Coalesce rx usecs :        25 (range 0 - 2880)

Coalesce tx usecs :        50 (range 0 - 2880)

MSI-X :                    Enabled (if supported by 2.6 kernel)

TSO :                      Enabled

WoL :                      Disabled


Unloading and Removing Driver
=============================

To unload the driver, do the following:

   vmkload_mod -u bnx2x



Driver signon:
-------------

QLogic QLE84xx/34xx/74xx 10Gigabit Ethernet Driver bnx2x 2.711.10.v[50,55].4 ($DateTime: 2014/08/20 02:32:40 $)


NIC detected:
------------

vmnic0: QLogic 57840 XGb (A1) PCI-E x8 2.5GHz found at mem e8800000, IRQ 16, node addr 001018360012


MSI-X enabled successfully:
--------------------------

bnx2x: vmnic0: using MSI-X


Link up and speed indication:
----------------------------

bnx2x: vmnic0 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON


Link down indication:
--------------------

bnx2x: vmnic0 NIC Link is Down

Memory Limitation:
--------------------

Note if you see messages in the log file which look like the following in
the vmkernel logs:

Dec  2 18:24:20 ESX4 vmkernel: 0:00:00:32.342 cpu2:4142)WARNING: Heap: 1435: Heap bnx2x already at its maximumSize. Cannot expand.
Dec  2 18:24:20 ESX4 vmkernel: 0:00:00:32.342 cpu2:4142)WARNING: Heap: 1645: Heap_Align(bnx2x, 4096/4096 bytes, 4096 align) failed.  caller: 0x41800187d654
Dec  2 18:24:20 ESX4 vmkernel: 0:00:00:32.342 cpu2:4142)WARNING: vmklinux26: alloc_pages: Out of memory

The error message above reveals that the ESX host is severely strained
because the global SKB heap usage has been exhausted.

Other indications that there is memory pressure is to look at the
driver statistics using the 'ethtool -S <vmnic #> command.  If the
counters for the 'rx_skb_alloc_discard' fields are incrementing this
could also be another indication that there is memory pressure.

To roughly calculate the global SKB heap usage one can use the
following formula:

(Number of Ports) x (Number of Queues per Port) x (Number of RX Queue entries).

Please refer to the VMware documentation on the Global SKB heap sizes
for each ESX release.

One can try to tune the bnx2x driver can try to remedy this situation:

1.  Reduce the RX ring size.  This can be done using the following
    ethtool commands: 

    To query the current ring sizes:

       ethtool -g <vmnic #>

    To set new RX ring size:

        ethtool -G <vmnic #> rx <new rx ring size>

2.  Another method to relive the memory pressure on the global SKB heap
    is to tune the NetQueue parameters by reducing the number of
    NetQueues per port.  (Please see the section MultiQueue/NetQueue
    for assistance on how to tune the NetQueue parameters.)


MSI-X Vector Limitations
--------------------

VMware ESX 6.0 only can manage a fixed number of MSI-X vectors.   Once this
limit has been reached, the following message is seen in the vmkernel logs:

Interrupt allocation failed with Out of resources

More information can be found at VMware's KB article:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2050783

Please use this for further reference.


MultiQueue/NetQueue:
--------------------

The optional parameter "num_queues" may be used to set the number of Rx and Tx
queues when interrupt mode being utilized is MSI-X. If interrupt
mode is different than MSI-X (see "int_mode" parameter), the number of Rx and
Tx queues will be set to 1 discarding the value of this parameter.

By default the driver will try to use the optimal number of NetQueues.
The optimal number is to have the number of NetQueues match the number
of CPU's on the machine.

If users would like to explicitly force the number of queues, users may
set the number of NetQueues via the following command:

   esxcfg-module -s "num_queues=<num of queues>" bnx2x

Users can also explicitly have the bnx2x driver to choose the number of
NetQueues to use via the following command:

   esxcfg-module -s "num_queues=0" bnx2x

To disable NetQueue.  This can be done by l forcing the number of queues to 1
if using MSI-X mode:

   esxcfg-module -s "num_queues=1" bnx2x

or use a different interrupt mode

Using INTA:
   esxcfg-module -s "int_mode=1" bnx2x

Using MSI:
   esxcfg-module -s "int_mode=1" bnx2x

With the examples above, please reboot the machine for the parameters to take
effect.

Mutiple RX Filters per NetQueue:

Each non-default RX queue can be programmed with at least one or more MAC
filters.  Packets matching filter criteria on a queue are DMAed into receive
buffers allocated for the queue.  Any RX frame not matching any RX filter will
be delivered to the RX default queue.  If an RX Netqueue has multiple applied
RX filters, then any RX frame matching any of those filters should be delivered
to that RX NetQueue.  This will be an advantage when there are more VM's with
virtual NIC then there are NetQueues.

The number of RX filters support per NetQueue is calculated by determining the
total number of filters supported by the hardware and dividing them evenly
among the NetQueues.  The number of RX filters can be controlled via the
following vmkernel module parameters, multi_rx_filters using the following
values:

  -1: use the default number of RX filters;
   0: Disable use of multiple RX filters;
   1..Max # the number of RX filters per NetQueue: will force the number of RX
        filters to use for NetQueue, if this is over the supported number of
        filters the hardware limit will be used.

VMDirectPath I/O pass-through Support:
--------------------
VMDirectPath I/O pass-through is supported if all the functions from a device
are assigned to a single VM.  PDA (physcial device assignment) is not supported.
By default BCM57710/BCM57711/BCM57712 are marked as non-shareable in the
default /etc/vmware/passthru.map file.

For BCM57800/BCM57810/BCM57840 devices, these devices also do not fully support
being used as Full Passthrough Shareable devices.  In this case, all the
functions of this device must be passed through to the same VM.  This is
because the default /etc/vmware/passthru.map file which is part of
the inboxed files do not mark these devices as not supporting Full
Passthrough Shareable.   This will allow the VMware configuration GUI
to passthrough functions to different VMs.   Please note that this is not
supported at this time.

Additionial information about VMDirectPath I/O pass-through support can be
found at VMware with the following KB article:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010789


SR-IOV Support:
--------------------

Starting with ESX5.5, SR-IOV for bnx2x will be supported.  To configure SR-IOV
for the bnx2x devices please complete the following:

1.  Update the BIOS a SRIOV capable BIOS

2.  Enable SR-IOV from the BIOS

3.  Enable SR-IOV from the CCM

4.  Load the bnx2x module with vmkernel module parameters 'max_vfs' to enable
    SR-IOV

The 'max_vfs' module parameter is a comma separated list that gives the of
number of VFs for each of the bnx2x functions.  The order in the list is the
bnx2x probe order.  The values in the list are defined as the following:

ʹ0ʹ: SR‐IOV is not enabled for this function
>0: number of VF's the driver should create for this function

For example:

vmkload_mod bnx2x max_vfs='0,0,4,8,0,0'

Enables SR‐IOV for the third and fourth bnx2x function with four and eight
VF's created  respectively.  

Please note that by default SR-IOV is disabled for each of the functions and
must be explictly enabled.

For further information and background on VMware ESX 5.5 SR-IOV support please
refer to the following location:

http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.networking.doc/GUID-CC021803-30EA-444D-BCBE-618E0D836B9F.html?resultof=%2522%2553%2552%252d%2549%254f%2556%2522%2520

Please use the following information from VMware as the minimum requirements
and restrictions for ESX5.5 SR-IOV support:

http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.networking.doc/GUID-E8E8D7B2-FE67-4B4F-921F-C3D6D7223869.html?resultof=%2522%2553%2552%252d%2549%254f%2556%2522%2520

SR-IOV Restrictions:

*  For this release, users must use the virtual NIC as the SR-IOV passthru
type.  Using the ESX5.1 method of assigning the VF as a PCI passthru device
is not supported.  This is because ESX5.5 has new SR-IOV policy enforcements
which are not compatible with the old ESX5.1 passthru method.  Further
information on how to configure the VM and ESX host can be found at the
following location:

http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.networking.doc/GUID-EE03DC6F-32CA-42EF-98FC-12FDE06C0BE0.html?resultof=%2522%2553%2552%252d%2549%254f%2556%2522%2520

*  Before the PF can issue a TX timeout, MTU change, change coalesce settings,
or close the device all the VFs must be inactive with the native VF driver
unloaded.  Once all the VFs for that particular function are unloaded then the
PF is allowed to be manipulated.  If there are VFs active and the PF is
manipulated, the driver will prevent the action from being taken and a warning
message will be displayed in the syslog.

*  ESX6.0 is limited in the number of MSI-X vectors it can register.  If this
limit is exceeded, then the VM will crash.  To dtermine if this is the problem,
in the directory of the datastore where the VM resides, there are vmware*.log
files.  In the log coorsponding to the instance where the VM crashed, one will
see the following line:
vcpu-0| I120: PCIPassthruChangeIntrSettings: 0c:08.6 failed to register interrupt (error code 195887110)

This indicates that the MSI-X register used for the SR-IOV passthru device was not succesfully register and
is what brought down the VM.

* ESX5.5 limits the number of passthru/emulated NICs per VM.  If this threshold
is exceeded, the following messages are seen in the system logs:

WARNING: NetPort: vm 69077: 827: too many vnic ports open on world VMM group
Net: 2267: can't connect <VM ethernet interface name> to <Port Group Name>: Limit exceeded

In this case the Port Group binding is not working between the VF and the
Port Group and the VF will fail to initialize because it will not be able to
find valid policy configuration.  To proceed and work around this problem,
please reduce the number of vNICs per VM.

*  Please note that brinig down and interface with the esxcli command on the
PF where VF are active with the command:

  'esxcli network nic down -n <vmnic #>'

will not allow the end user to bring up the interface again.  This is a problem
in the vmkernel where the state of when the device is close is not kept in
sync with the upper vmkernel layers.   The driver will prevent the device from
going down but the vmkernel will think the device is down.   When there is an
indication to bring up the device there is no indication because the device is
already up.

Please note that if there is a parity error, the driver will prevent the chip
from being reset to recover becaues of the active VM's.  To recover the end user
will need either:

1.  Power off all the VMs and manually reset the chip via the command:

    vsish -e set /net/pNics/<vmnic #>/reset 1

2.  Reboot the machine

*  VLAN support:

Please note that with this release hardware VLAN filtering is not supported and only HW VLAN stripping
is supported.  Please use the following to determine the VLAN mode used by the PF/VF:

Port Group Setting: None
VLAN Mode:          No VLAN tagging
No VLAN Tagging will be supported for VF. Tagged packets will not reach the guest. And tagged packets
from guest will not be allowed to go out on VF

Port Group Setting:  VLAN (with tag as X)
VLAN Mode:           VST (VLAN Switch tagging)
All packets going of VF will be tagged with VLAN ID X and only packets tagged with X will reach the VF.
Tag X will be stripped before the packet reaches the VM.

Port Group Setting:  VLAN Trunking
VLAN Mode:           VGT (VLAN Guest Tagging)
Supports VLAN tagging from guest. Tagged and untagged both packets will reach the VM. Tags will be
not be stripped before the packet reaches the guest.

Private VLAN Not applicable Not applicable to passthroughNIC, effects will be same
as ‘None’.

Known issue – SR-IOV VFs may not appear when using the vmkernel
'max_vfs' bnx2x driver parameter to enable/create VFs:
------------------
 
During boot up of a server running ESXi5.1u0 with SR-IOV enabled, the
VMKernel reallocates PCIE BARs for the VFs that are enabled by the
bnx2x.  Because all the PCIE BARs of the VFs of a specific port are
allocated together, any failure will cause all the allocation of all the
VF's to fail.

Due to a known issue in the VMKernel the reallocation of the BARs is not
optimized and may fail in several ways:
 
1)  The VFs of one port are allocated, but the VFs of the second port do
    not appear.

2)  The allocation of VF BARs fails on all ports.

When this issue happens, messages similar to the following may appear in
the vmkernel.log:

WARNING: LinPCI: vmklnx_enable_vfs:1250:unable to enable SR-IOV on PCI device 0000:05:00.1
 
When the VFs are allocated successfully the following may appear in the
vmkernel.log:
 
WARNING: LinPCI: vmklnx_enable_vfs:1246: enabling 64 VFs on PCI device 0000:05:00.0

Possible workarounds to when the VFs fail to be created:

1.  Reduce the number of VFs requested using the 'max_vfs' vmkernel 
    bnx2x driver parameter until the VF's are created

2.  Use a newer version of ESX

Known issue - SR-IOV VFs are not properly initialized within the guest OS

This can occur with the following configuration.  If the 'max_vfs' parameter is set to 0,n
n VFs are created in PF1 while PF0 is not configured for SR-IOV mode and a multi-port device.
If the VFs are created in a PF other than the lowest numbered one, ESXi will not set the
ARI_Capable_Hierarchy bit.  This will cause problems with communication between
the VF and PF.

Additionial information can be found at VMware with the following KB article:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2039824&sliceId=1&docTypeID=DT_KB_1_1&dialogID=847560180&stateId=1%200%20847566570

VXLAN Support:
--------------------
Starting from ESX5.1, RSS feature is supported on VXLAN traffic. Starting from ESX5.5, 
TSO/CSUM offload and vxlan filtering feature is supported on VXLAN traffic.  

The module parameter "RSS" is used to control RSS feature. This value specifies number of RSS queues
to be enabled in the driver. The max number of RSS queues is 4 and the minimum is 2. 
This feature is disabled by default for ESXi5.1. Enabled by default for ESXi5.5 with 4 RSS queues..

esxcfg-module -s RSS=<number of RSS queues> bnx2x

The module parameter "enable_vxlan_ofld" is used to control VXLAN TSO/CSUM offload feature. Value of 1 
enables this feature and value of 0 disables it. 
This feature is not applicable for ESXi5.1. Enabled by default for ESXi5.5.

esxcfg-module -s enable_vlxan_ofld=<1:enable/0:disable> bnx2x

The module parameter "disable_vxlan_filter" is used to control vxlan filtering for RX netqueues. VXLAN filter
is composed of outer mac + inner mac + VXLAN ID. 
Value of 1 disables this feature and value of 0 enables it.
This feature is not applicable for ESXi5.1. Disabled by default for ESXi5.5.

esxcfg-module -s disable_vxlan_filter=<1:disable/0:enable> bnx2x

In addition to this, on ESXi5.5, some selective VXLAN Tx offloads' statistics are displayed as
part of "ethtool -S <vmnic_name>".

Note:
1.	RSS and NPAR/HP Flex10

	When RSS needs to be configured with NPAR or HP Flex10 which has more than 2 ethernet functions
	enumerated per adapter, please specify total number of netqueues per ethernet function using
	"num_queues" module param as below:

# esxcfg-module -s "num_queues=8" bnx2x

Notes:
==================
1.  Forced speed Setting when using NPAR and ESX:

    When using the VIClient to set the speed of the vmnic, this setting is persisted in
    the configuration file /etc/vmware/esx.conf.  These entries will be store in the tree
    as the following:

    /net/pnic/child[0006]/duplex = "full"                               
    /net/pnic/child[0006]/mac = "XX:XX:XX:XX:XX:XX"          
    /net/pnic/child[0006]/name = "vmnicX"
    /net/pnic/child[0006]/speed = "1000" 

    Note that the speed for this NIC is set to 1G.  Because this is persisted and applied
    with every ESX boot, the persisited speed settings will override any setting the
    user has set using the CCM even if the CCM is used again to configure the speed.

Dynamic Netqueue Support
--------------------
With this feature enabled, smaller memory footprint can be achieved.  Memories for rx netq
are not allocated until they are requested by the host.  They are also freed when the rx
netq are deallocated.  In addition, TPA and RSS queues are no longer reserved to allow
more flexible memory usage. Starting from ESX5.5, this change also enable us to support
VMKNETDDI_QUEUEOPS_QUEUE_FEAT_DYNAMIC.  This features enables us to change attribute of
a rx queue in run time.  Finally starting from ESX5.5, this feature also allow us to
support RSS on enterprise traffic.	Thus, default value for module parameter RSS	is 4
for ESX5.5 and later.
